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RARE-EVENT PROBABILITIES FOR A HEAVY- TAILED 
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Abstract. In this paper a method based on a Markov chain Monte Carlo 
(MCMC) algorithm is proposed to compute the probability of a rare event. 
The conditional distribution of the underlying process given that the rare event 
occurs has the probability of the rare event as its normalizing constant. Using 
the MCMC methodology a Markov chain is simulated, with that conditional 
distribution as its invariant distribution, and information about the normal- 
izing constant is extracted from its trajectory. The algorithm is described 
in full generality and applied to the problem of computing the probability 
that a heavy-tailed random walk exceeds a high threshold. An unbiased esti- 
mator of the reciprocal probability is constructed whose normalized variance 
vanishes asymptotically. The algorithm is extended to random sums and its 
performance is illustrated numerically and compared to existing importance 
sampling algorithms. 
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1. Introduction 

In this paper a Markov chain Monte Carlo (MCMC) methodology is proposed 
for computing the probability of a rare event. The basic idea is to use an MCMC 
algorithm to sample from the conditional distribution given the event of interest 
and then extract the probability of the event as the normalizing constant. The 
methodology will be outlined in full generality and exemplified in the setting of 
computing hitting probabilities for a heavy-tailed random walk. 

A rare-event simulation problem can be often be formulated as follows. Consider 
a sequence of random variables X^ 2 ' , . . . , each of which can be sampled repeat- 
edly by a simulation algorithm. The objective is to estimate p( n > = P(X^ n ' 6 A), 
for some large n, based on a sample X^ n \ . . . , X^!^. It is assumed that the prob- 
ability P(X<") £4)^0, as n — > oo, so that the event {!<"' e A} can be thought 
of as rare. The solution to the problem consists of finding a family of simulation 
algorithms and corresponding estimators whose performance is satisfactory for all 
n. For unbiased estimators p^ of p^ a useful performance measure is the relative 
error: 

RE (n) _ Var(ig) 

(p(n))2 • 

An algorithm is said to have vanishing relative error if the relative error tends to 
zero as n — > oo and bounded relative error if the relative error is bounded in n. 
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It is well known that the standard Monte Carlo algorithm is inefficient for com- 
puting rare-event probabilities. As an illustration, consider the standard Monte 
Carlo estimate 

t=0 

of p< n > = P(XW G A) based on independent replicates X^ n) , . . . , Xp} v The 
relative error of the Monte Carlo estimator is 

Var^" 3 ) p(n)(l_ p (n)) i i 

— — — ^ rv~) 

(p("))2 T(p(™)) 2 Tp(») T 

as n — >• oo, indicating that the performance deteriorates when the event is rare. 

A popular method to reduce the computational cost is importance sampling, see 
e.g. [2]. In importance sampling the random variables Xq , . . . , are sampled 

independently from a different distribution, say G^ , instead of the original distri- 
bution F^ n \ The importance sampling estimator is defined as a weighted empirical 
estimator, 

t=o 

where — dF^ /dG^ 1 ' is the likelihood ratio, which is assumed to exist on A. 
The importance sampling estimator p^ is unbiased and its performance depends 
on the choice of the sampling distribution G^ . The optimal sampling distribution 
is called the zero- variance distribution and is simply the conditional distribution, 

fl->(.) = P(A-W 6 -|.V» 6 M)= P ' A ""^,- nM » . 

In this case the likelihood ratio weights are equal to p^ which implies that p^ 
has zero variance. Clearly, the zero-variance distribution cannot be implemented 
in practice, because pl n J is unknown, but it serves as a starting point for selecting 
the sampling distribution. A good idea is to choose a sampling distribution G^ 
that approximates the zero- variance distribution and such that the random variable 
X^ n ' can easily be sampled from G^ n \ the event {iW £ A} is more likely under 
the sampling distribution G*™) than under the original F^ n \ and the likelihood 
ratio is unlikely to become too large. Proving efficiency (e.g. bounded relative 
error) of an importance sampling algorithm can be technically cumbersome and 
often requires extensive analysis. 

The methodology proposed in this paper is also based on the conditional dis- 
tribution F^\ Because is known up to the normalizing constant pW it is 
possible to sample from F^' using an MCMC algorithm such as a Gibbs sampler 
or Metropolis- Hastings algorithm. The idea is to generate samples X^ n \ . . . , Xp™^ 1 
from a Markov chain with stationary distribution Fy and construct an estimator 
of the normalizing constant p( n \ An unbiased estimator of (p(")) _1 is constructed 
from a known probability density on A, which is part of the design, and the 
original density f( n > of X^ by 

qT = T h /<»>(*J B) ) ' ( } 
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The performance of the estimator depends both on the choice of the density 
and on the ergodic properties of the MCMC sampler used in the implementation. 
Roughly speaking the rare-event properties, as n — > oo, are controlled by the choice 
of v^ n ' and the large sample properties, as T — > oo, are controlled by the ergodic 
properties of the MCMC sampler. 

The computation of normalizing constants and ratios of normalizing constants 
in the context of MCMC is a reasonably well studied problem in the statistical 
literature, see e.g. [5] and the references therein. However, such methods have, to 
the best of our knowledge, not been studied in the context of rare-event simulation. 

To exemplify the MCMC methodology we consider the problem of computing the 
probability that a random walk S n = Yi+- • -+Y n , where Y%, . . . , Y n are nonnegative, 
independent, and heavy-tailed random variables, exceeds a high threshold a n . This 
problem has received some attention in the context of conditional Monte Carlo 
algorithms [TJ |3] and importance sampling algorithms [HI [HI [SJ H] ■ 

In this paper a Gibbs sampler is presented for sampling from the conditional 
distribution P((Yi, . . . , Y n ) G • | S n > a n ). The resulting Markov chain is proved to 
be uniformly ergodic. An estimator for (j/™)) -1 of the form (jTTTJ) is suggested with 
t)(") as the conditional density of (Yy, . . . , Y n ) given max{Yi, . . . , Y n } > a n . The 
estimator is proved to have vanishing normalized variance when the distribution of 
Y\ belongs to the class of subexponential distributions. The proof is elementary 
and is completed in a few lines. This is in sharp contrast to efficiency proofs 
for importance sampling algorithms for the same problem, which require more 
restrictive assumptions on the tail of Y\ and tend to be long and technical [8) 03 S] ■ 
An extension of the algorithm to a sum with a random number of steps is also 
presented. 

Here follows an outline of the paper. The basic methodology and a heuristic 
efficiency analysis for computing rare-event probabilities is described in Section [2] 
The general formulation for computing expectations is given in Section[3] along with 
a precise formulation of the efficiency criteria. Section 0] contains the design and 
efficiency results for the estimator for computing hitting probabilities for a heavy- 
tailed random walk, with deterministic and random number of steps. Section [5] 
presents numerical experiments and compares the efficiency of the MCMC estima- 
tor against an existing importance sampling algorithm and standard Monte Carlo. 
The MCMC estimator has strikingly better performance than existing importance 
sampling algorithms. 

2. Computing rare-event probabilities by Markov chain Monte Carlo 

In this section an algorithm for computing rare-event probabilities using Markov 
chain Monte Carlo (MCMC) is presented and conditions that ensure good conver- 
gence are discussed in a heuristic fashion. A more general version of the algorithm, 
for computing expectations, is provided in Section[3]along with a precise asymptotic 
efficiency criteria. 

2.1. Formulation of the algorithm. Let A be a real- valued random variable 
with distribution F and density / with respect to the Lebesgue measure. The 
problem is to compute the probability 

p = P(A G A) = [ dF. (2.1) 

J A 
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The event {X G ^4} is thought of as rare in the sense that p is small. Let Fa be 
the conditional distribution of X given X E A. The density of Fa is given by 



dF A 
dx 



(x) 



f(x)I{x G A} 



P 



(2.2) 



Consider a Markov chain (X t ) t >o whose invariant density is given by (|2.2[) . Such a 
Markov chain can be constructed by implementing an MCMC algorithm such as a 
Gibbs sampler or a Metropolis-Hastings algorithm, see e.g. [21 HP). 

To construct an estimator for the normalizing constant p, consider a non-negative 
function v, which is normalized in the sense that J A v(x)dx = 1. The function v 
will be chosen later as part of the design of the estimator. For any choice of v the 
sample mean, 

]_^v(X t )I{X t 
T 



EA) 



t=o 



f(X t 



can be viewed as an estimate of 
~v(X)I{X G A}' 



Thus, 



f(X) 



T-l 



v(x) f(x) 
f{x) p 



1 



1 



dx — — v(x)dx = — . 



It = 



^^2u(X t ), where u(X t ) = 



v(X t )I{X t G A} 
f(Xt) 



(2.3) 



1 is an estimator of p. 



is an unbiased estimator of q = p~ . Then px = q T ' 

The expected value above is computed under the invariant distribution Fa of 
the Markov chain. It is implicitly assumed that the sample size T is sufficiently 
large that the burn-in period, the time until the Markov chain reaches stationarity, 
is negligible or alternatively that the burn-in period is discarded. Another remark 
is that it is theoretically possible that all the terms in the sum in (|2.3p are zero, 
leading to the estimate qr = and then px — oo. To avoid such nonsense one can 
simply take pr as the minimum of q^ 1 and one. 

There are two essential design choices that determine the performance of the 
algorithm: the choice of the function v and the design of the MCMC sampler. The 
function v influences the variance of u(X t ) in (|2.3p and is therefore of main concern 
for controlling the rare-event properties of the algorithm. It is desirable to take v 
such that the normalized variance of the estimator, given by p 2 Var(§r), is not too 
large. The design of the MCMC sampler, on the other hand, is crucial to control the 
dependence of the Markov chain and thereby the convergence rate of the algorithm 
as a function of the sample size. To speed up simulation it is desirable that the 
Markov chain mixes fast so that the dependence dies out quickly. 



2.2. Controlling the normalized variance. This section contains a discussion 
on how to control the performance of the estimator qr by controlling its normalized 
variance. 

For the estimator qx to be useful it is of course important that its variance is 
not too large. When the probability p to be estimated is small it is reasonable 
to ask that Var(gr) is of size comparable to q 2 — p~ 2 , or equivalently, that the 
standard deviation of the estimator is roughly of the same size as p . To this end 
the normalized variance p 2 Var(gr) is studied. 
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Let us consider Var(gr)- With 

, . v(x)I\x £ A} 

it follows that 

1 T ~ 1 

P 2 Var FA (q T ) = p 2 Var FA (- £ u{X f )) 

4=0 

„ T-1 T-1 

= P 2 (y Var FA (n(X )) + — £ £ Cov Fa (u(X s ), u(X t )j) . (2.4) 

t=0 s=t+l 

Let us for the moment focus our attention on the first term. It can be written as 

2 2 

V - Var FA («(*„)) = |r (Et a KA ) 2 ] - E Fa [«(*„)] 2 ) 



,/(x) " / p 2 
g?x — 1 



7 1 / 2 (x) p p 2 

1 / f v 2 (x)p 



T\J A f(x) 

Therefore, in order to control the normalized variance the function v must be 
chosen so that J A yrjx dx is close to p _1 . An important observation is that the 
conditional density (|2.2I) plays a key role in finding a good choice of v. Letting v 
be the conditional density in (|2.2j) leads to 

« 2 (a:) f f 2 (x)I{x£A} ^ 1 /" 1 
"TTT 6 ^ / 27T1 = — / f(x)dx = -, 

which implies, 



2 

^Var Fjl («(*)) =0. 

This motivates taking v as an approximation of the conditional density (|2.2j) . 

If for some set B C A the probability P(A £ B) can be computed explicitly, 
then a candidate for v is 

the conditional density of X given X £ B. This candidate is likely to perform well 
if P(X £ B) is good approximation of p. Indeed, in this case 

v 2 (x) f f 2 (x)I{x £ B] 1 f u 1 



f{x) J A P(X 6 fl)V(i) P(^65) 2 i B JW P(Ae5)' 

which will be close to p _1 . 

Now, let us shift emphasis to the covariance term in (|2.4p . As the samples 
(At^Jg 1 form a Markov chain the At's are dependent. Therefore the covariance 



term in (12. 4[) is non-zero and may not be ignored. The crude upper bound 
Cov FA (u(X s ),u(X t )) < Var FA ( U (X )), 
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leads to the upper bound 

^rE E Cov FA (u(X s ),u(X t )) <p 2 (l- ±)Vm Fa (u(X )) 

t=0 s=t + l 

for the covariance term. This is a very crude upper bound as it does not decay 
to zero as T — >• oo. But, at the moment, the emphasis is on small p so we will 
proceed with this upper bound anyway. As indicated above the choice of v controls 
the term p 2 Vaxp A (u(Xo)) . We conclude that the normalized variance (|2.4p of the 
estimator gr is controlled by the choice of v when p is small. 

2.3. Ergodic properties. As we have just seen the choice of the function v con- 
trols the normalized variance of the estimator for small p. The design of the 
MCMC sampler, on the other hand, determines the strength of the dependence 
in the Markov chain. Strong dependence implies slow convergence which results in 
a high computational cost. The convergence rate of MCMC samplers can be an- 
alyzed within the theory of yj-irreducible Markov chains. Fundamental results for 
(p-irreducible Markov chains are given in [151 116] . We will focus on conditions that 
imply a geometric convergence rate. The conditions given below are well studied in 
the context of MCMC samplers. Conditions for geometric ergodicity in the context 
of Gibbs samplers have been studied by e.g. [5J [TSJ [TH] , and for Metropolis-Hastings 
algorithms by |14) . 

A Markov chain (X t ) t >o with transition kernel p(x, •) = P(X t+ i S • | Xt = x) 
is (/3-irreducible if there exists a measure tp such that J^. (x, •) <C ¥>(•); where 
p^'(x, •) = P(X t € • | Xo = x) denotes the t-step transition kernel and <C denotes 
absolute continuity. A Markov chain with invariant distribution 7r it is called ge- 
ometrically ergodic if there exists a positive function M and a constant r £ (0, 1) 
such that 

\\pV(x,-)-ir(-)\\ T v<M(xy, (2.5) 

where || • ||tv denotes the total- variation norm. This condition ensures that the 
distribution of the Markov chain converges at a geometric rate to the invariant 
distribution. If the function M is bounded, then the Markov chain is said to be 
uniformly ergodic. Conditions such as (I2.5[) may be difficult to establish directly 
and are therefore substituted by suitable minorization or drift conditions. A mi- 
norization condition holds on a set C if there exist a probability measure v, a 
positive integer to, and S > such that 

p^(x,B) > 6u(B), 

for all x € C and Borel sets B. In this case C is said to be a small set. Minorization 
conditions have been used for obtaining rigorous bounds on the convergence of 
MCMC samplers, see e.g. [T7] . 

If the entire state space is small, then the Markov chain is uniformly ergodic. 
Uniform ergodicity does typically not hold for Metropolis samplers, |14j Theorem 
3.1. Therefore useful sufficient conditions for geometric ergodicity are often given 
in the form of drift conditions [6j [14] . Drift conditions are also useful for estab- 
lishing central limit theorems for MCMC algorithms, see [TT] and the references 
therein. When studying simulation algorithms for random walks, in Section [U we 
will encounter Gibbs samplers that are uniformly ergodic. 
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2.4. Heuristic efficiency criteria. To summarize, the heuristic arguments given 
above lead to the following desired properties of the estimator. 

(1) Rare event efficiency: Construct an unbiased estimator $r of p _1 according 
to (|2.3p by finding a function v which approximates the conditional density 
(12.21) . The choice of v controls the normalized variance of the estimator. 

(2) Large sample efficiency: Design the MCMC sampler, by finding an ap- 
propriate Gibbs sampler or a proposal density in the Metropolis-Hastings 
algorithm, such that the resulting Markov chain is geometrically ergodic. 

3. The general formulation of the algorithm 

In the previous section an estimator, based on Markov chain Monte Carlo, was 
introduced for computing the probability of a rare event. In this section the same 
ideas are applied to the problem of computing an expectation. Here the setting is 
somewhat more general. For instance, there is no assumption that densities with 
respect to Lebesgue measure exist. 

Let X be a random variable with distribution F and h be a non-negative F- 
intcgrablc function. The problem is to compute the expectation 



e = E[h(X)] = J h{x)dF{x). 



In the special case when F has density / and h(x) = I{x 6 A} this problem reduces 
to computing the probability in (|2.1[) . 

The analogue of the conditional distribution in (|2.2j) is the distribution F^ given 

by 

Fh(B) = - I h(x)dF(x), for measurable sets B. 
9 Jb 

Consider a Markov chain (X t )t>o having Fh as its invariant distribution. To 
define an estimator of 9 , consider a probability distribution V with V <C Fh- 
Then it follows that V <C F and it is assumed that the density dV/dF is known. 
Consider the estimator of ( = 9~ x given by 



1 



t=o 
where 

1 dV 
u{x)= 9dF h {x) - 
Note that u does not depend on 9 because 7<F/, and therefore 

ldV I dV 

u{x) = edF h [x) = W)^ {xl 

for x such that h(x) > 0. The estimator (13.11) is a generalization of the estimator 
(|2.3p where one can think of v as the density of V with respect to Lebesgue measure. 
An estimator of 9 can then constructed as 9t — Ct 1 • 

The variance analysis of C,t follows precisely the steps outlined above in Section 
[5] The normalized variance is 

9 2 Var Fh (Cr) = ^ Var Fh (u(X Q )) + — £ £ Cov F)i (u(X s ), u(X t )) , (3.2) 

t=0 s=t+l 



s 
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where the first term can be rewritten as 

6 - Var Fh (u(X Q )) = °- (E Fh [u{X f] - E Fh [u(X )] ' 



dV 



dF h 



1 



The analysis above indicates that an appropriate choice of V is such that Ey[^-] 
is close to 1. Again, the ideal choice would be taking V = Fh leading to zero 
variance. This choice is not feasible but nevertheless suggests selecting V as an 
approximation of Fh- The crude upper bound for the covariance term in (|3.2[) is 
valid, just as in Section [2j 

3.1. Asymptotic efficiency criteria. Asymptotic efficiency can be conveniently 
formulated in terms of a limit criteria as a large deviation parameter tends to 
infinity. As is customary in problems related to rare-event simulation the problem 
at hand is embedded in a sequence of problems, indexed by n = 1,2, ... . The 
general setup is formalized as follows. 

Let (X 1 -"'),^! be a sequence of random variable with X^ having distribution 
F^ n K Let ft, be a non-negative function, integrable with respect to F^ n \ for each n. 
Suppose 

flW = E[/i(lW)] = J h{x)dF^ n \x) -y 0, 

as n — > oo. The problem is to compute 8^ for some large n. 

Denote by F^ the distribution with dF^/dF^ = h/O^. For the nth prob- 
lem, a Markov chain (X^)JsJ with invariant distribution F^ is generated by 
an MCMC algorithm. The estimator of C^™* 1 = is based on a probability 

distribution V^ n \ such that <C F^, with known density with respect to F^ n \ 
An estimator Qjf* of £ is given by 



Tin) 

ST 



f^Hx^), 



t=0 



where 



, n 1 dV^I 



The heuristic efficiency criteria in Sections [2] can now be rigorously formulated 
as follows: 

(1) Rare-event efficiency: Select the probability distributions yW such that 

{6 {n) f Var p( „)(M (n) (A)) ->•{), as n -> oo. 

(2) Large sample size efficiency: Design the MCMC sampler, by finding an ap- 
propriate Gibbs sampler or a proposal density for the Metropolis-Hastings 
algorithm, such that, for each n > 1, the Markov chain (X^)t>o is geo- 
metrically ergodic. 
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Remark 3.1. The rare-event efficiency criteria is formulated in terms of the ef- 
ficiency of estimating by Cr™' ) . If one insists on studying the mean and 
variance of = (Cr^) -1 , then the effects of the transformation x x~ x must 
be taken into account. For instance, the estimator 9 T is biased and its variance 
could be infinite. 



4. A RANDOM WALK WITH HEAVY- TAILED STEPS 

In this section the estimator introduced in Section [5] is applied to compute the 
probability that a random walk with heavy-tailed steps exceeds a high threshold. 

Let Yi, . . . , Y n be independent and identically distributed random variables with 
common distribution Fy and density fy with respect to Lebesgue measure. Con- 
sider the random walk S n = Y± + • ■ • + Y n and the problem of computing the 
probability 

P {n) = P(S n > a n ), 

where a n — » oo sufficiently fast that p( n > — > as n — » oo. 

It is convenient to denote by Y^™) the ?i-dimcnsional random vector 



Y(") = (Y 1 ,...,Y„) T , 



and the set 



A n = {y G R n : l T y > a n }, 
where 1 = (1, . . . , 1) T G R™ and y = {y\, . . . , y n ) T ■ With this notation 

p< n > = P(S n > a n ) = P(1 T Y(") > On) = P(Y< n > G An). 
The conditional distribution 



4" ) (.) = P(Y (n) e-|Y( n )eA, i ) I 



has density 



dF { /\ . Y\%ifY{y j )i{yi + --- + y n >a n } 

-^-(W.-.fc) = W) • (4-1) 

The first step towards defining the estimator of to construct the Markov 

chain (Yj )t>o whose invariant density is given by (|4.ip using a Gibbs sampler. In 

_ (n) 

short, the Gibbs sampler updates one element of Yj at a time keeping the other 
elements constant. Formally the algorithm proceeds as follows. 

Algorithm 4.1. Start at an initial state Yq = (Yq^, . . . , Yo, n ) where Yq^ + • • • + 

Yo t n > a n . Given Y^ = (Yt,i, . . . , Yt, n ), for some t = 0, 1, . . ., the next state Yj™^ 
is sampled as follows: 

(1) Draw ji, . . . ,j n from {1, . . . ,n} without replacement and proceed by up- 
dating the components of Y^ in the order thus obtained. 

(2) For each k = 1, . . . , n, repeat the following. 

(a) Let j = jk be the index to be updated and write 



Yt,^ = (Y M , . . . , Ytj-u Yt, j+ i, • ■ • , Y t ,„). 
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Sample Y/ ^ from the conditional distribution of Y given that the sum 
exceeds the threshold. That is, 

P(y/ J e • I Y t ,_,.) =p(y e ■ | Y + J2 Y t.k > a n ) . 

(b) Put Y' t = (Yt,i, • ■ • > Ytj-ijY^p Y t j + x, . . . , Ft,n) T ' 
(3) Draw a random permutation 7r of the numbers {1, . . . , n} from the uniform 
distribution and put = {Y^ (1) , Y^ {n) ). 

Iterate steps (l)-(3) until the entire Markov Chain (Yj™- 1 )^) 1 is constructed. 

Remark 4.2. In the heavy-tailed setting the trajectories of the random walk lead- 
ing to the rare event are likely to consist of one large increment (the big jump) 
while the other increments are average. The purpose of the permutation step is to 
force the Markov chain to mix faster by moving the big jump to different locations. 
However, the permutation step in Algorithm 14. H is not really needed when consid- 
ering the probability P(S n > a n ). This is due to the fact that the summation is 
invariant of the ordering of the steps. 

The following proposition confirms that the Markov chain (Y( n ') t >o, generated 
by Algorithm 14. 11 has as its invariant distribution. 

Proposition 4.3. The Markov chain (Yj™^) t >o, generated by Algorithm \4-l\ has 

(n) 

th i conditional distribution as its invariant distribution. 

Proof. The goal is to show that each updating step (Step 2 and 3) of the algorithm 
preserves stationarity. Since the conditional distribution Fjf 1 is permutation in- 
variant it is clear that Step 3 preserves stationarity. Therefore it is sufficient to 
consider Step 2 of the algorithm. 

Let Pj(y,-) denote the transition probability of the Markov chain (Yj )t>o 
corresponding to the jth component being updated. It is sufficient to show that, 
for all j ~ 1, . . . , m and all Borel sets of product form Bi X • • • X B n C A n) the 
following equality holds: 

x ■ ■ • x B n ) = E F („, [P,(Y, B t x ■ ■ • x B n )). 
Observe that, because B\ x • • ■ x B n c A n , 

n 

F { ^(B l x •• • x B n ) = E[ J] I{Y k G B k } | Y 1 + ■■ ■ +Y n > a n 

k=l 

_ E[J{y,- £ B J }I{Y l + ... + Y n > a n } nr^j J { y fc ^ B k }] 
P(Y 1 + ---+Y n >a n ) 

[ E[I{Y J &B J }\Y ] >a rl -(Y 1 + ---+Y^ 1 +Y ] + 1 + ...Y n ),Y u ...,Y^ u Y ] + u ...Xr l }n^ J I {YkeB k } - 
_ P(Y ] >a n -{Y 1 + -+Y 3 - 1 +Y J + 1 + ...Y n )\Y 1 ,...,Y j - 1 .Y 3 + 1 ,...,Y n ) 

~ P(Yi + ---+Y n >a„) 

_ E[P,-((Yi, . . . , Y n ) T , B, x---xB n ) I{Y k e B k }} 
P(Y 1 + ■ ■ ■ + Y n > O 
= E[P,-((Y l5 . . . , Y n ) T ,B 1 x ■ • • x B n ) | Yt + ■ ■ ■ + Y n > a n ] 
= E F („ ) [P,-(Y,S 1 x-xB„)|. 
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□ 

As for the ergodic properties, Algorithm 14.11 produces a Markov chain which is 
uniformly ergodic. 

Proposition 4.4. For each n > 1, the Markov chain (Yj"^)t>o is uniformly er- 
godic. It satisfies the following minorization condition: there exists S > such 
that 

P(Y{ n ^B\Y^=y)>SF^{B), 
for all y G A n and all Borel sets B C A n . 

Proof. Take an arbitrary n > 1. Uniform ergodicity can be deduced from the 
following minorization condition (see [16]): there exists a probability measure v, 
5 > 0, and an integer to such that 

P(Y t ( : l) G B I Y< n) = y) > Sv(B), 

for every y G A„ and Borel set B C A n . Take y G A„ and write g( ■ | y) for 
the density of P(Y^"' ) G • | Yq = y). The goal is to show that the minorization 
condition holds with to = 1, S = /nl, and v = F^ 1 ' . 

For any x G A n there exists an ordering ji, . . . ,j n of the numbers {1, . . . , n\ 
such that 

Vji S ^ji j ■ • ■ j Dj k S x j k > > x jk+i j • • • -> -"jn ' 

for some k G {0, . . . , n}. The probability to draw this particular ordering in Step 1 
of the algorithm is at least 1/nl. It follows that 

, I y ) > i fy^h) 1 ^]! > a n - Ei^ji Vi\ 

~ n! ^r(on-Ei#jifi) 

^(On-Ei^j^J/i-^i) 
MgjnKigjn > an - 3^ - ■ ■ ■ } 

_Fy(a„ - 3fj, - . . .x^.J 

By construction of the ordering ji , . . . , j„ all the indicators are equal to 1 and the 
expression in the last display is bounded from below by 

In,, , P {n) n"=l fY{Xj)I{xi + ---+ Xn > a n } 
— T II friXj) = — J - 7n) ■ 

nl xx ft! »w 

i=i 

The proof is completed by integrating both sides of the inequality over any Borel 
set B c A n . □ 

Note that so far the distributional assumption of steps Y\, . . . , Y n of the random 
walk have been very general. The only assumption has been the existence of a 
density. For the rare-event properties of the estimator the design of V^ n > is essential 
and this is where the distributional assumptions become important. In this section 
a heavy-tailed random walk is considered. To be precise, assume that the variables 
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Yi, . . . , Y n are nonnegative and that the tail of Fy is heavy in the sense that there 
is a sequence (a n ) of real numbers such that 

lim > °"> = 1, (4.2) 

r^oo P(M„ > a n ) y ' 

where M n denotes the maximum of Y\ , . . . , Y n . The class of distributions for which 
(|4.2p holds is large and includes the subexponential distributions. General condi- 
tions on the sequence (a n ) for which (I4.2|) holds are given in [7]. For instance, if 
Fy is regularly varying at oo with index fj > 1 then (I4.2[) holds with a n = an, for 
a > 0. 

Next consider the choice of V^ n \ As observed in Section[2]a good approximation 
to the conditional distribution F^ 1 ' is a candidate for V^ n \ For a heavy-tailed 
random walk the "one big jump" heuristics says that the sum is large most likely 
because one of the steps is large. Based on the assumption (|4.2[) a good candidate 
for y( n ) is the conditional distribution, 

7 (n)(. )=P ( Y (») G .\ Mn >a n ). 
Then has a known density with respect to F^ n '(-) = P(Y( n ' e •) given by 
dVW 1 1 

dFM (y) = P(M n>0n ) f{y : > an} = l-F Y ( a y {y ■ y ^ > a " } - 

The estimator of q( n > — P(S„ > a„) _1 is then given by 

# ) = * E (Y t (n) ) - T-T^r • i E '{V?=^ > a„} (4.3) 



T^dF( n > y 1 ' l-F Y (a n ) n T 

t=0 y * t=0 

where (Y^ )t>o is generated by Algorithm 14. II Note that the estimator (|4.3p can 
be viewed as the asymptotic approximation (1 — Fy(a n ) n )^ 1 of (p^™') _1 multiplied 
by the random correction factor y ^1=0 ^{^J=i^t.j > a n }. The efficiency of this 
estimator is based on the fact that the random correction factor is likely to be close 
to 1 and has small variance. 

Theorem 4.5. Suppose that (|4.2p holds. Then the estimator qffl in (|4.3[) has 

vanishing normalized variance for estimating (p*-™)) -1 . That is, 

lim (p(»)) 2 Var F( „,(g < T " ) ) = 0. 

n— too A n 

Proof. With w (n) (y) = x _ F ^/ an)n H^iVj > a n} it follows from (O that 
(p(")) 2 Var F („,( U (")(Y(™))) 

" ^ Var F( „,(/{Y: V^Yj > a n }) 



P(S n 


> a n ) 2 . 


P(M n 


> a„) 2 


P(S n 


> a n f ^ 


P(M n 


> a n ) 2 


P(S n 


> On) / 


P(M n 





I S n > a n ) 



1 P(M n > a n ) ] () 

P(S„ > O n ) 



This completes the proof. □ 
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Remark 4.6. Theorem 14.51 covers a wide range of heavy-tailed distributions and 
even allows the number of steps to increase with n. Its proof is elementary. This is 
in sharp contrast to the existing proofs of efficiency (bounded relative error, say) 
for importance sampling algorithms that cover less general models and tend to be 
long and technical, see e.g. |8j 5]H]. It must be mentioned, though, that Theorem 
14.51 proves efficiency for computing (p*-™- 1 ) -1 , whereas the authors of [51 013] prove 
efficiency for a direct computation of p^ 71 ' . 

4.1. An extension to random sums. In application to queueing and ruin theory 
there is particular interest in sums consisting of a random number of heavy-tailed 
steps. For instance, the stationary distribution of the waiting time and the workload 
of an M/G/l queue can be represented as a random sum, see Asmussen (2003), 
Theorem 5.7. The classical Cramer-Lundberg model for the total claim amount 
faced by an insurance company is another standard example of a random sum. In 
this section Algorithm 14.11 is modified to efficiently estimate hitting probabilities 
for heavy-tailed random sums. 

Let Yi, K2, ■ • ■ be non- negative independent random variables with common dis- 
tribution Fy and density fy. Let (iV^)n>i be integer valued random variables 

independent of Yi, Y2, Consider the random sum S N ( n ) = Yi -I h Y^w and 

the problem of computing the probability 



where a n — > 00 at an appropriate rate. 

Denote by the vector (N^ n \Yi, . . . , Y N ( n) ) T . The conditional distribution 
(„) 

of Y given S^o) > a« is given by 



A Gibbs sampler for sampling from the above conditional distribution can be 
constructed essentially as in Algorithm 14.11 The only additional difficulty is to up- 
date the random number of steps in an appropriate way. In the following algorithm 
a particular distribution for updating the number of steps is proposed. To ease the 
notation the superscript n is suppressed in the description of the algorithm. 

Algorithm 4.7. To initiate, draw Nq from P(A S ■) and 1q,i 5 ■ • ■ , io.Af such that 
Y0.1 + • • • + Yo t N > a n- Each iteration of the algorithm consists of the following 

steps. Suppose Y t = {k t ,y t ,i, ■ ■ ■ ,Vt,k t ) witn Vt,i H \- Vt,k t > a n . Write = 

min{j : y t ,i H h yt,j > a n }. 

(1) Sample number of steps A t+1 from the distribution 



If N t +i > k t , sample Y t +i t k t +i, ■ ■ ■ ,^t+i,Af t+1 independently from Fy and 

put Y t X ^ = (Yt t i, . . . , Y tj k t ,Y t+ i t k t +i, • ■ • , Y t+ i tNt+1 ). 
(2) Proceed by updating all the individual steps as in Algorithm 14. II 

(a) Draw j±, ... ,jN t+1 from {1, . . . , A t _|_i} without replacement and pro- 
ceed by updating the components of Yj ^ in the order thus obtained. 

(b) For each ft = 1, ... , Nt+i, repeat the following. 



P 



■< n > = P(5 W( „, > o»), 



P(N^ =k,(Y 1 ,...,Y k ) g • I SjfM >a n ) 
= P((Yj, . . . , Y k ) £ ■ , S k > an)P(jV (n) = k) 

p(") 



P(kt+i I K) 



p(n = k t+1 )i{k t+1 > k;} 

P(N > kt) 
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(i) Let j — jk be the index to be updated and write 
1 t-j — \ I t,i > • • ■ ) x t,j-i> *t,j+v • • • > 1 t,N t+l >- 

(2) 

Sample Y^j from the conditional distribution of Y given that 
the sum exceeds the threshold. That is, 

P(*i 2) e ■ | Y&) = p(y G • I Y + J2 Y S > a, 

(ii) Put y (2) - (Y (1) y (1) y (2) r (1) y (1) F 

(c) Draw a random permutation 7r of the numbers {1, . . . , N t +i} from the 
uniform distribution and put Y t+ i = (Nt+i^J^,. . . , Y t y J^ Nt+i) ). 

Iterate until the entire Markov Chain (Y^^Jq 1 is constructed. 

Proposition 4.8. The Markov chain (Y t )t>o generated by Algorithm ^ . 7| has the 
conditional distribution P((AT, Yt, . . . , Yjv) £ ■ \ Yi + . . . Yn > a n ) as its invariant 
distribution. 

Proof. The only essential difference from Algorithm 14.11 is the first step of the 
algorithm, where the number of steps and possibly the additional steps are updated. 
Therefore, it is sufficient to prove that the first step of the algorithm preserves 
stationarity. The transition probability of the first step, starting from a state 
(kt,yt,i,---,yt,kt) witn k t = min{j : y tl + ■■■ + y tjj > a n }, can be written as 
follows. 

P {1) {kt,y t ,i, ■ ■ .,y t ,k t 'i k t+i,Ai x • ■ • x A kt+1 ) 
= P(JV t+ i - k t +i,(Y tA ,...,Y t>kt+1 ) g Ai x ••■ x A kt+1 

| N t = kt,Y t>1 = y t< i, . . . , F t ,fe, = Vt,k t ) 
= I p{h+i | K) ]lfc=i ^{yt.fc e A k}, h+i < h, 

~ \ P (k t+1 1 k* t ) nti nvt, k g Afcin^+i *ha), *t+i > *t. 

Consider the stationary probability of a set of the form {fc t+ i} x A\ x ■ ■ • x A kt+1 . 
It holds that 

E,[pW(JV t ,Kt,i,...,lt l iv t ;*t+i,Ai x ■•• x A kt+1 )} 
= p , g \ rE[pW(2V,yi,...,y JV ;* t+1 ,Ai x •■• x A kt+1 )I{S N > a n }} 



1 oo 



fet=l 



x pW(k t ,Y 1 ,...,Y kt ;kt + i,A 1 x ••• x A fct+1 )/{5 fci > a„}. 
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With B k , = {(yi,y 2 , ■■■): min{j : yi + ■ ■ ■ + Vj > a} = k*}, A® = Ax x • • • x A kt , 
and Af = A\ x • • • x A kt+1 the expression in the last display can be written as 

1 / kt + 1 

E = fc *) 
.fc t =i 

fe* 



P(S N > a n ) 

x E [ J2 H Pi , • • • , Y kt ) e B k , }P« (kt , ii , . . . , Y kt ; k t+1 , A® +i ) 



fc*=i 

oo 

+ E = fc *) 

fc t =fet+i + l 

fct+1 

xE[^/{(y lr .,n t+1 )eB t . }p« (fc* , Y lt . . . , Y kt ■ kt+i , Af t+i ) 
fc*=l / 

Inserting the expression for PW the last expression equals 

fct fet+1 

x^P((F ll ...,n t )eB fe ,n4f>(fc t+1 |r) J] F y (A,-) 
fe*=i i=fc t +i 

oo fet+l \ 

+ E P(JV = fet)£p((Ki,...,n t+1 )eB fc .nA® + Xfc t+1 |A ! *) . 
fet=fe t +i+i fe*=i / 

Changing the order of summation the last expression equals 

/ fet+i fet+i 



P(^>«n)V J fe fa ti.' 

fet + i 

xP((y 1 ,..,4)e5 l ,n<) P (t t+ iin [] Py(^) 

j'=fe t +i 

fet+l oo \ 

+ E E P(^-fc t )P((^i,---,n t+1 )eP fe .n< +i )p(fc t+1 |fc*) . 

k*=i fe t =fe t+ i+i / 

Since P{(Y u ...,Y kt ) e P fe , n A®) Uj% +1 M^) = P((Y U . . . ,Y kt+1 ) e B k . n 
^ fet+i) ^ e ^ as * ex P ress i° n equals 

1 / fet+i fet+i 

E E p(^ = *t)p((^i,---,n t+1 )GBfc.nA® + Xfc t+1 \k*) 



fet+l oo \ 

+ E E P(^ = fct)P((n,...,n t +i)e^,nA| +i )p(fc t+1 l a*) . 



fe*=l fe t =fe t +l+l 
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P(S N > a n ) 



Summing over kt the last expression equals 

P((Yu. . . ,Y kt+1 ) G B k , n Af t+ Jp(k t+ i | fc*)P(fc* < N < k t+1 ) 

, fe*=i 

kt+l \ 

+ £ p((y, . . . , Y kt+1 ) g A*, n Af t+1 )p(fe t+1 | k*)P(N > h+i + 1) . 

fc«=i / 

From the definition of p(&t+i | fc*) it follows that the last expression equals 

vfq - ; E P((n,... l lWi)eflt-nA| ; )Kfc t +i|fe*)P(iV>fc*) 
r^OjV > a n ) £rz 1 

= P(JV = k t+1 , (Yx, . . .,Y kt+1 ) G | y x + ••• + Y N > a n ), 

which is the desired invariant distribution. This completes the proof. □ 

Proposition 4.9. The Markov chain (Y t ) t >o generated by Algorithm \4-7\ is uni- 
formly ergodic. It satisfies the following minorization condition: there exists S > 
such that 

P(Yi GB|Y =y)> SP((N, Y u . . . , Y N ) G B \ Yi + • • • + Y N > a n ), 
for all y G A = U k >i{(k, y\, . . . , y/c) : yi -I \-y k > a n } md all Borel sets B <Z A. 

The proof requires only a minor modification from the non-random case, Propo- 
sition 231 and is therefore omitted. 

Next consider the distributional assumptions and the design of V^ n \ The main 
focus is on the rare event properties of the estimator and therefore the large de- 
viation parameter n will be suppressed to ease notation. Let the distribution of 
the number of steps P(N^ G •) to depend on n. By a similar reasoning as in 
the case of non-random number of steps the following assumption are imposed: the 
variables N^ n \ Y\, Y%, . . . and the numbers a n are such that 

P(y H h Y Ni n) > On) , .s 

hm — — — = 1, (4.4) 

n->oo P(M Ar( „) < On) 

where M k — maxfy, . . . , Y k }. Note that the denominator can be expressed as 

oo 

P(%.) > an) = J2 P ( M k > a n )P(N^ = k) 

fc=i 



J2^-F Y (a n ) k ]P(N^ =k) 



fc=i 



= 1 - 9N(«)( F Y(a n )), 

where <?jv( n >(X) = E^ 1 ™'] is the generating function of N^ n \ Sufficient conditions 
for (|4.4| to hold are given in [13], Theorem 3.1. For instance, if Fy is regularly 
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varying at oo with index /3 > 1 and has Poisson distribution with mean 

X n — > oo, as n — > oo, then (|4.4j) holds with a n = aA„, for a > 0. 

Similarly to the non-random setting a good candidate for is the conditional 
distribution, 

V(")(.)=P(Y ( " 3 e- \M NM >a n ). 
Then has a known density with respect to F^ n \-) = P(Y ™ € •) given by 
dV^ 1 

1 



1 - 9N^){ F Y{a n )) 
The estimator of tp") = P(S n > a„) _1 is given by 

l^dVM^n). 1 



I{{yi,---,Vk) : Vj=iVj > an}- 



T 



t=o 



where (Y t ) t >o is generated by Algorithm 14.71 

Theorem 4.10. Suppose (14.4[) holds. The estimator gf^ in (|4.5I) has vanishing 
normalized variance. That is, 

lim (p^) 2 Var T „(^ n) ) =0, 

71— S-OO 

where ir n denotes the conditional distribution P(Y^ G • | S n m > a n ). 

Remark 4.11. Because the distribution of may depend on n Theorem 14.101 
covers a wider range of models for random sums than those studied in [8j 0] where 
the authors present provably efficient importance sampling algorithms. 

Proof. Since = P(S N (n) > a n ) and 

.<"><*.* 

P(Af Ar( „ ) > o n ) 

it follows that 
[p(")] 2 Var^(u(")(Y (n) )) 

^„(/{vfj;V, >a„}) 



0, 





> a n f . 


P(M W( „ 


) > a n ) 2 


P(S wM 


> a n ) 2 ^ 


P(Mjv(„ 


) > a n ) 2 


P(^iV<") 


> a n ) / 


P(M N(n 


> > a„) V 



by (I4.4p . This completes the proof. □ 



18 



T. GUDMUNDSSON AND H. HULT 



5. Numerical experiments 

In this section the performance of the estimator = ((Z^)^ 1 with as 
in (|4.3I) is illustrated numerically. The literature includes numerical comparison 
for many of the existing algorithms. In particular in the setting of random sums. 
Numerical results for the algorithms by Dupuis et al. [8], the hazard rate twist- 
ing algorithm by Juneja and Shahabuddin [12], and the conditional Monte Carlo 
algorithm by Asmussen and Kroese [3] can be found in [8]. Additional numerical 
results for the algorithms by Blanchet and Li [3], Dupuis et al. [8], and Asmussen 
and Kroese [3] can be found in [4. From the existing results it appears as if the 
algorithm by Dupuis et al. [8] has the best performance. Therefore, we only include 
numerical experiments of the MCMC estimator and the estimator in [8], which is 
labelled (IS). 

By construction each simulation run of the MCMC algorithm only generates a 
single random variable (one simulation step) while both importance sampling and 
standard Monte Carlo generate n number of random variables (n simulation steps) 
for the case of fixed number of steps ( N + 1 in the random number of steps case) . 
Therefore the number runs for the MCMC is scaled up to get a fair comparison of 
the computer runtime between the three approaches. 

First consider estimating P(5„ > a n ) where S n = Y% + ■■ ■ + Y n with Y\ having 
a Pareto distribution with density fy(x) = (3(x + 1)~ ,3 ~ 1 for x > 0. Let a n = 
an. Each estimate is calculated using b number of batches, each consisting of T 
simulations in the case of importance sampling and standard Monte Carlo and Tn 
in the case of MCMC. The batch sample mean and sample standard deviation is 
recorded as well as the average runtime per batch. The results are presented in 
Table [1] The convergence of the algorithms can also be visualized by considering 
the point estimate as a function of number of simulation steps. This is presented in 
Figure [TJ The MCMC algorithm appears to outperform the importance sampling 
algorithm consistently for different choices of the parameters. The improvement 
over importance sampling appears to increase as the event becomes more rare. 
This is due to the fact that the asymptotic approximation becomes better and 
better as the event becomes more rare. 

Secondly consider estimating P(Sn > a p ) where Sn = Yi H + Y)v with N 

Geometrically distributed P(iV = k) = (1 — p) k ~ x p for k = 1,2,... and a p — 
aE[N] — a/ p. The estimator considered here is pr — ($t) _1 with as in (|4.5[) . 
Again, each estimate is calculated using b number of batches, each consisting of 
T simulations in the case of importance sampling and standard Monte Carlo and 
TE[N] in the case of MCMC. The results are presented in Tabled Also in the 
case of random number of steps the MCMC algorithm appears to outperform the 
importance sampling algorithm consistently for different choices of the parameters. 

We remark that in our simulation with p = 0.2, a = 5 ■ 10 9 the sample standard 
deviation of the MCMC estimate is zero. This is because we did not observe any 
indicators J^y^—iVtj > a p } being equal to in this case. 
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Figure 1. The figure illustrates the point estimate of P(S n > a n ) 
as a function of the number of simulation steps, with n = 5, a = 10, 
(3 = 2. The estimate generated via the MCMC approach is drawn 
by a solid line and the estimate generated via IS is drawn by a 
dotted line. 
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Table 1 . The table displays the batch mean and standard devia- 
tion of the estimates of P(S n > a n ) as well as the average runtime 
per batch for time comparison. The number of batches run is 
b, each consisting of T simulations for importance sampling (IS) 
and standard Monte Carlo (MC) and Tn simulations for Markov 
chain Monte Carlo (MCMC). The asymptotic approximation is 
P(max{Yi,...,Y n } > a n ). 



b = 25, T = 10 b , = 


= 2, n = 5, a 


= 5, p max = 0.737e-2 




MCMC 


IS 


MC 


Avg. est. 


1.050e-2 


1.048c-2 


1.053e-2 


Std. dev. 


3e-5 


9e-5 


27e-5 


Avg. time per batch(s) 


12.8 


12.7 


1.4 


b = 25, T = 10 & , P = 


2, n = 5, a = 


- 20, Pmax — 


4.901e-4 




MCMC 


IS 


MC 


Avg. est. 


5.340e-4 


5.343e-4 


5.380e-4 


Std. dev. 


6e-7 


13e-7 


770e-7 


Avg. time per batch(s) 


14.4 


13.9 


1.5 


b = 20, T = 10 b , P = 2, n = 5, a = 


10 3 , _p max = 


1.9992c-7 




MCMC 


IS 




Avg. est. 


2.0024c-7 


2.0027c-7 




Std. dev. 


3e-ll 


20e-ll 




Avg. time per batch(s) 


15.9 


15.9 




b = 20, T = 10 b , P = 2, n = 5, a = 


10 4 , p max = 1.99992C-9 




MCMC 


IS 




Avg. est. 


2.00025c-9 


2.00091c-9 




Std. dev. 


7e-14 


215e-14 




Avg. time per batch(s) 


15.9 


15.9 




b= 25, T= 10 b , P = 2, n = 20, a = 


= 20, j? max — 


1.2437e-4 




MCMC 


IS 


MC 


Avg. est. 


1.375e-4 


1.374e-4 


1.444e-4 


Std. dev. 


2e-7 


3c-7 


492e-7 


Avg. time per batch(s) 


52.8 


50.0 


2.0 


b = 25, T = 10 5 , p = 2 


, n = 20, a = 


200, p max = 


1.2494e-6 




MCMC 


IS 


MC 


Avg. est. 


1.2614c-6 


1.2615e-6 


1.2000c-6 


Std. dev. 


4e-10 


12e-10 


33,166e-10 


Avg. time per batch(s) 


49.4 


48.4 


1.9 


b = 20, T = 10 b , p = 2, n = 20, a = 


10 3 , Pmax 


4.9995e-8 




MCMC 


IS 




Avg. est. 


5.0091c-8 


5.0079e-8 




Std. dev. 


7e-12 


66e-12 




Avg. time per batch(s) 


53.0 


50.6 




6 = 20, T= 10 b , P = 2, 


n = 20, a = 


10 i J*max — 


5.0000e-10 




MCMC 


IS 




Avg. est. 


5.0010c-10 


5.0006e-10 




Std. dev. 


2e-14 


71e-14 




Avg. time per batch(s) 


48.0 


47.1 
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Table 2. The table displays the batch mean and standard devia- 
tion of the estimates of P(<SW > a p ) as well as the average runtime 
per batch for time comparison. The number of batches run is b, 
each consisting of T simulations for importance sampling (IS) and 
standard Monte Carlo (MC) and T~E[N] simulations for Markov 
chain Monte Carlo (MCMC). The asymptotic approximation is 
Pmax = P(max{Yi, . . . , Y N } > a p ). 



b= 25, T= 10 b , [3 = 


1, p= 0.2, a -- 


— 10 , Pmax — 


0.990c-2 




MCMC 


IS 


MC 


Avg. est. 


1.149e-2 


1.087e-2 


1.089e-2 


Std. dev. 


4e-5 


6e-5 


35e-5 


Avg. time per batch(s) 


25.0 


11.0 


1.2 


b= 25, T= 10 b , (3 = 


1, p= 0.2, a -- 


= 10 3 , p max = 


0.999e-3 




MCMC 


IS 


MC 


Avg. est. 


1.019e-3 


1.012e-3 


1.037e-3 


Std. dev. 


le-6 


3e-6 


76e-6 


Avg. time per batch(s) 


25.8 


11.1 


1.2 


b = 20, T = 10 b , /3 = 1, p = 0.2, a = 5 


10 , Pmax 


2.000000e-8 




MCMC 


IS 




Avg. est. 


2.000003e-8 


1.999325e-8 




Std. dev. 


6e-14 


1114e-14 




Avg. time per batch(s) 


385.3 


139.9 




6= 20, T= 10 b , (3 = 1, 


p = 0.2, a = 5 • 10 y , p max = 


2.0000c-10 




MCMC 


IS 




Avg. est. 


2.0000c-10 


1.9998e-10 




Std. dev. 





13e-14 




Avg. time per batch(s) 


358.7 


130.9 




b = 25, T = 10 b , (3 = 


1, p = 0.05, a 


— 10 , Pmax = 


0.999e-3 




MCMC 


IS 


MC 


Avg. est. 


1.027e-3 


1.017e-3 


1.045e-3 


Std. dev. 


le-6 


4e-6 


105e-6 


Avg. time per batch(s) 


61.5 


44.8 


1.3 


6= 25, T= 10 5 , P = 1, 


p = 0.05, a = 


5 • 10 5 , Pmax : 


= 1.9999e-6 




MCMC 


IS 


MC 


Avg. est. 


2.0002c-6 


2.0005e-6 


3.2000c-6 


Std. dev. 


le-10 


53e-10 


55,678e-10 


Avg. time per batch(s) 


60.7 


45.0 


1.3 



